Deep learning classification of active tuberculosis lung zones wise manifestations using chest X-rays: a multi label approach

Chest X-rays are the most economically viable diagnostic imaging test for active pulmonary tuberculosis screening despite the high sensitivity and low specificity when interpreted by clinicians or radiologists. Computer aided detection (CAD) algorithms, especially convolution based deep learning architecture, have been proposed to facilitate the automation of radiography imaging modalities. Deep learning algorithms have found success in classifying various abnormalities in lung using chest X-ray. We fine-tuned, validated and tested EfficientNetB4 architecture and utilized the transfer learning methodology for multilabel approach to detect lung zone wise and image wise manifestations of active pulmonary tuberculosis using chest X-ray. We used Area Under Receiver Operating Characteristic (AUC), sensitivity and specificity along with 95% confidence interval as model evaluation metrics. We also utilized the visualisation capabilities of convolutional neural networks (CNN), Gradient-weighted Class Activation Mapping (Grad-CAM) as post-hoc attention method to investigate the model and visualisation of Tuberculosis abnormalities and discuss them from radiological perspectives. EfficientNetB4 trained network achieved remarkable AUC, sensitivity and specificity of various pulmonary tuberculosis manifestations in intramural test set and external test set from different geographical region. The grad-CAM visualisations and their ability to localize the abnormalities can aid the clinicians at primary care settings for screening and triaging of tuberculosis where resources are constrained or overburdened.

shows the accuracy, precision and F1 score of 44 manifestations of intramural holdout test set. Accuracy varies between 71 and 99% with 50% (n = 22) abnormalities having accuracy above 90%.     Table S3) shows that accuracy ranges from 64 to 100%, 72% abnormalities scored accuracy > 90% with nearly 30% (n = 14) abnormalities spotted with 100% accuracy. Due to the lack of true positive data points, many abnormalities were declared undefined in precision and F1 score. ROC curve for lung zone wise and lung wise abnormalities for intramural test set (Fig. 2) and external test set are also shown (Fig. 3). www.nature.com/scientificreports/ Grad-CAM analysis. In this section we examine the performance of grad-CAM image to visually analyse the pathological changes in the CXR (ground truth) thereby visualizing the signals of Tuberculosis. Grad-CAMs can be generated in any layer in the network, and offer better localization than saliency maps when taken in deeper layers 27 . The samples presented in this section have been generated with the network trained on the dataset with 44 classes. Supplementary Fig. S2 provides the grad-CAM images of active tuberculosis manifestations www.nature.com/scientificreports/ generated with network trained on the dataset with lung zone wise and lung wise manifestations, to prove the usefulness of heatmap images and coarse localisation.    The sample in third row in Fig. 4 shows 30-year-old female patient confirmed with active pulmonary tuberculosis. The ground truth shows that the pathological changes in X-ray are bilateral upper zone and left mid zone Cavitation, Opacity in bilateral upper and mid zones and Pleural Thickening on left lung. Network predicted all classes to be consistent with ground truth and in addition Cavity lesion on right mid zone (score 0.58) and right lung Pleural Thickening (score 0.94) which is verified and agreed by radiologist. Heatmap produced by the algorithm was analysed by radiologist and showed alignment with ground truth. Some over estimation on right lower zone in grad-CAM heatmap was noticed.  Figure 5 third row sample shows chest X-ray of 63 years old female with active cavitary TB. Findings show large infiltration with cavitation in right upper lobe and signs of infiltration in right middle lobe. The algorithm prediction aligned with ground truth with cavity in RUZ and RMZ (scores of 0.68 and 0.66 respectively) and opacity in RUZ and RMZ (scores of 0.87 and 0.92 respectively). Grad cam heatmap verified by radiologist stressed that heat spread to left lung is noise and heatmap pointed to the cavitational area. It is also noted that the spread of the heat from RMZ to RLZ is turned to an over estimation of abnormality.

Discussion
Tuberculosis in adults' manifests as assorted findings in chest X rays, often cavitation and ill-defined opacity in the apical zone and broncho-pulmonary segments of upper and lower lobes. Cavitation is the hallmark in adult active TB and exhibits in half of the cases 2,3,28,29 . In this study, the EfficientNetB4 architecture for multi-label classification of TB specific abnormalities and Grad-CAM visualization has shown a good performance in localising the varied manifestations of TB.
In an accuracy study conducted in a tertiary hospital in India 30    Our work achieved superior sensitivity, specificity and precision compared to the sensitivity, specificity and precision of validation set in the feasibility study 31 . For Cavity our network achieved sensitivity of 82% while ViDi scored 50% for sensitivity, for specificity ours was 94% and ViDi was 92% and for precision ours was 95% and ViDi able to obtain 25% precision. Similarly, Opacity 100% vs 74% for sensitivity, 50% vs 76% in specificity and for precision 100% vs 74%. In case of Pleural Effusion, the results on are par with our findings, sensitivity 90% vs 100%, specificity match with both networks 100% and precision was 95% vs 100% 31 .
The external test set for validation is taken from different geographical region than the training, validation and test set. The external validation results shown that the model is well generalized in predicting the specific abnormalities e.g. cavity, opacity, pleural effusion which are consistent with active TB. The lack of labelled Tuberculosis dataset affected the performance of metrices in external validation. The available literature in deep learning community was either binary classification of Tuberculosis, or multilabel classification of thoracic abnormalities in image wise. There is no published article in classification of active pulmonary tuberculosis manifestations in lung zone wise.
The grad-CAM analysis shows that the focus of grad-CAM is indeed on the manifestations of active tuberculosis affected areas. This localisation heatmap is remarkable for its meticulousness and can be used in clinical settings to aid the clinicians in primary health care where service from expert radiologist is often lacking or nil.

Materials and methodology
Study design and goal. This was a retrospective study with classification model creation using transfer learning technique to detect the various manifestations in active tuberculosis from CXR. This was followed by generation of Grad-CAM heatmap and comparison with radiologist annotated ground truth.

Dataset sources and curation for model development.
We retrospectively collected chest X-ray images and, patient demography from the National TB Elimination Programme referral register in a specific DMC from 2017 to 2020. Patients were confirmed as active TB either by physician or microbiological examinations. Posteroanterior (PA) chest X-ray was downloaded from the Picture Archival and Communication System (PACS) which use Health Level Seven International (HL7) integration system between PACS and Health Information System. Unique patient identifier was used to extract the images from PACS where the date of CXR taken was close to the date of sputum examination result date. If multiple CXR were available within the lab result date, we collected all available CXR and sequentially numbered based on the date to identify the prognosis/progress of TB. All CXR used in the study were de-identified using system generated study identifier and any overlay information in CXR were removed to protect the privacy of the patients.
There were no missing data on patient demographics and CXR and images were in Tag Image File Format with 24 Bits per pixel. Digital Radiography modality was used with resolution of 1530 × 1896. All CXR were taken by Portable Samsung Retrofit DR System. We collected 1350 CXR from 858 patients confirmed with active TB. We

Dataset sources and curation for external validation. Publicly available Health Insurance Portability
and Accountability (HIPPA) compliant datasets maintained by the National Library of Medicine, Maryland, USA were used as external test set -Department of Health and Human Services, Montgomery County, Maryland, USA (MC) collection 33 . The MC collections are in in Portable Network Graphics (PNG) format with chest X-ray of 58 TB patients. We curated 36 abnormal CXR for external test set validation after excluding pediatric, pneumonia and old/inactive TB images. Mean age of MC collection was 47.5 (20.6 SD) and 33.3% being female.
Ground truth. All PA TB CXR were read by single radiologist (HG) for various active TB manifestations and reported in a specified format approved by the study team. Peer validation of 10% CXR was done by the radiologist (SA) and pulmonologist (MR). The inter-observer agreement between reader and two peer assessors (SA and MR) was almost perfect and moderate (k = 0.83, [95% CI: 0.72-0.93] and k = 0.80, [95% CI: 0.71-0.94] respectively) 34 .
In the present study we excluded abnormalities that occur less than 10% in data points from training, validation and testing dataset. The final dataset contains X-rays with 44 labels in total-12 abnormalities in image wise,30 abnormalities in bilateral upper, middle and lower zones and 14 abnormalities reported as left and right lung findings Ground truth of the MC collection was obtained from the dataset source. www.nature.com/scientificreports/ Data partitions and augmentation. Dataset was split into training (80%) and test (20%) sets. We split the training set into training (64%) and validation (16%). Repeated images of the same patients were included only in training set to avoid data leakage in validation set and holdout test set. We performed offline augmentation on training set alone with the following parameters (1) random rotation with probability of 1 and rotation range from ± 7 degrees excluding zero (2) Random brightness with probability of 1 by adjusting the color value "v" by NumPy random uniform function from the "hsv" image format.
Active TB manifestation classification model. In (Fig. 6). The desktop computer was equipped with Intel i9-9820X CPU @3.30 GHz, 64G RAM, and dual NVIDIA GeForce RTX 2080Ti @11G GPU. For the Region of Interest (lung area) segmentation task, we used U-Net, a deep convolutional neural network architecture that has shown remarkable accuracy in medical image segmentation tasks. Segmented lung area was padded with 50 pixels to retain the lung border and extracted into 380 × 380 size and saved in Joint Photographic Experts Group (JPEG) format. The saved images were then partitioned, augmented and fed into classification model for training, validation and testing.
EfficientNetB4 35 pre-trained networks on ImageNet were used in this work as base models. The input image size of EfficientNetB4 was set to 380 × 380. We used weights from ImageNet (transfer learning) to initialize the network. The classifier in base model is replaced with the following; (1)  was performed to the EfficientNetB4 model with 12 labels. Initially all layers except Batch Normalization layers in the EfficientNetB4 model were set non-trainable as the mean and variance of the training dataset differed from ImageNet 36 and initialized the network with ImageNet weights. The training was performed using Sigmoid Focal Cross Entropy as losses function and Adam as optimizer with following parameters β1 = 0.9, β2 = 0.999, ϵ = 1 × 10 -7 and learning rate 1 × 10 -6 .
The network was trained for 50 epochs (training session 1) with mini-batches of 4 samples and Early Stopping with patience set to 3 aided by validation loss. The mini-batches were shuffled on each epoch to randomize the training method and decrease overfitting and saved the model weights in hdf5 format. www.nature.com/scientificreports/ In training session 2, we re-initialized the network with saved model weights from training session one, then we unfroze top 31 layers and retrained the network for 100 epochs with learning rate reduction of 1 × 10 -1 , keeping β1, β2, ϵ, and mini-batch parameters unchanged and saved trained model weights in hdf5 format.
In training session 3, we initialized EfficientNetB4 with training session two weights and fine-tuned top 150 layers of the network with learning rate 1 × 10 -8 and keeping β1, β2, ϵ, and mini-batch parameters unchanged for 150 epochs and saved trained model weights in hdf5 format. In the last training session, network was initialized with weights from training session three and fine-tuned top 242 layers for 200 epochs with learning rate 1 × 10 -9 and other hyperparameters were unaltered.
Training EfficientNetB4 model with 44 class. The final saved weights from training session 4 were used to initialize EfficientNetB4 network and change the classifier head with 44 labels. The network was fed with dataset with 44 class label and repeated the training strategy adopted in training EfficientNetB4 model with 12 class.
Evaluation metrics. Testing of the model was done using intramural holdout test set and extramural test set (MC collection). The following metrics were used to evaluate the classification of the model using the test sets, (1) Sensitivity, (2) Specificity, (3) Area Under the Receiver Operating Characteristic (AUC), (4) Accuracy, (5) Precision and (6) F1-Score. In MC collection, classes with zero data points were excluded from Sensitivity, AUC, Precision and F1-Score analysis. Confidence Interval (CI) for AUC were calculated using Hanley & McNeil test 37 , and CI for sensitivity and specificity were obtained using the Wilson Score 38 method. Statistical analysis was done using Python 3.7 statistical library, and a P-value of 0.05 was considered statistically significant.
Visualisation. We generated Grad-CAM for the trained network to understand the network and to gain the visual detection of abnormalities from CXR. This technique was used to get the discriminative image regions to weigh more for classification of the abnormalities. Grad-CAM uses deeper higher abstract feature maps to generate the heatmap, which typically produce better localisation, but there is a trade off on the quality of the images due to the pooling layers leading to coarse localisation of abnormalities. In this work we generated grad-CAM for the EfficientNetB4 multi-label network trained with 44 class. Generated grad-CAM were analysed and interpreted by the radiologist (HG).

Strengths and limitations.
This study has several strengths. The use of EfficientNetB4 architecture and the CXR interpretations and findings were peer validated by pulmonologist and radiologist and obtained interreader agreement. In this work, active tuberculosis manifestations were classified into lung zones and lung wise. We applied domain specific data augmentations after a detailed literature review and expert opinion from radiologist. We employed cross population data set (MC collection) to evaluate the model performance.
One of the primary limitations of the study was limited number of data points in certain classes, and we excluded data points which had prevalence less than 10%. All CXR were read by only single radiologists opposed to multiple radiologists or pulmonologists. Though a sample of findings was peer validated there may have been some degree of misclassification of the visual signals in the chest X-ray.

Conclusions
Digital technologies, especially Artificial Intelligence based solutions can accelerate the screening, triaging, and diagnosis of Tuberculosis in automation of radiology. Deep learning algorithm can aid in interpretation of chest X-ray findings such as parenchymal and pleural abnormalities. In this work we focused on the classification of active pulmonary tuberculosis manifestations both image wise and lung zone wise. Our study demonstrates that deep learning network can classify active tuberculosis manifestations with remarkable accuracy in image wise and lung zone wise. Classification of abnormalities along with Grad-CAM visualization would be helpful in improving the precision and accuracy of the detection of tuberculosis manifestations and expedite the screening and triaging process in resource constrained settings where the service of trained technologists or expert radiologists is lacking or overburdened. As of our best knowledge this is the first work to classify thirty abnormalities categorized into upper, mid and lower zones of left and right lung, and fourteen abnormalities in left and right lung wise. Further research in the classification of various abnormalities of active tuberculosis is necessary and there is a scarcity of literature in this domain.
Ethics approval and consent. Ethical approval for this study was obtained from the Institutional Ethics committee for Observational studies of Jawaharlal Institute of Postgraduate Medical Education and Research (JIPMER), Puducherry, India (JIPMER ethics committee number JIP/IEC/2019/533). Waiver of written informed consent was approved by JIPMER institutional ethics committee as all data sources used (patient demography, laboratory records, and chest X-ray) were previously available, and no patients needed to be contacted. Additionally, all data were collected anonymously and de-identified using study identifier before reading by the radiologist, model development, validation, and testing. All methods were carried out in accordance with Indian Council of Medical Research (ICMR) and International Committee on Harmonization of Good Clinical Practice (ICH-GCP) guidelines and regulations.

Data availability
The raw patient image data that support the findings of this work are taken from an ongoing study and due to specific institutional requirements governing privacy protection, the dataset used in this work are not available www.nature.com/scientificreports/ in public. The external test dataset used in this study, Montgomery is available at https:// data. lhncbc. nlm. nih. gov/ public/ Tuber culos is-Chest-X-ray-Datas ets/ Montg omery-County-CXR-Set/ Montg omery Set/ index. html.